Dora the Explorer: Directed Outreaching Reinforcement Action-selection
ثبت نشده
چکیده
Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step exploration value. While there are a few model-based solutions to this difficulty, a model-free approach is still missing. We propose E-values, a generalization of counters that can be used to evaluate the propagating exploratory value over stateaction trajectories. We compare our approach to commonly used RL techniques, and show that using E-value improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to learn continuous MDPs.
منابع مشابه
Dora the Explorer: Directed Outreaching Reinforcement Action-selection
Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step ...
متن کاملDora the Explorer: Directed Outreaching Reinforcement Action-selection
Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step ...
متن کاملA Computational Model of Cortico-Striato-Thalamic Circuits in Goal-Directed Behaviour
A connectionist model of cortico-striato-thalamic loops unifying learning and action selection is proposed. The aim in proposing the connectionist model is to develop a simple model revealing the mechanisms behind the cognitive process of goal directed behaviour rather than merely obtaining a model of neural structures. In the proposed connectionist model, the action selection is realized by a ...
متن کاملA Novel Structure for Realizing Goal-directed Behavior
Intelligent organisms complete goal-directed behaviour by accomplishing a series of cognitive process. Inspired from these cognitive processes, in this work, a novel structure composed of Adaptive Resonance Theory and an Action Selection module is introduced. This novel structure is capable of recognizing task relevant patterns and choosing task relevant actions to complete goal-directed behavi...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کامل